Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Data enhancement method for drugs under graph-structured representation
Yinjiang CAI, Guangjun XU, Xibo MA
Journal of Computer Applications    2023, 43 (4): 1136-1141.   DOI: 10.11772/j.issn.1001-9081.2022040489
Abstract282)   HTML6)    PDF (1966KB)(89)       Save

Small sample data can lead to over-fitting problems in machine learning models. In the field of drug development, most data tend to be small samples, which greatly limits the application of machine learning techniques in this field. To solve the above problem, a drug data enhancement method based on graph structure was proposed. The samples were perturbed by the proposed method and new similar samples were generated to expand the dataset. The proposed method are consisted of four sub-methods, which are node discarding method based on molecular backbone, edge discarding method based on molecular backbone, multi-sample splicing methods and hybrid strategy method. In specific, the perturbation of drug molecules was completed by the node and edge discarding method based on molecular backbone in the way of a small number of deletion operation on the composition and structure of drug molecules; the perturbation was completed by the multi-sample splicing method through using an addition operation to combine different molecules; in the hybrid strategy method, the diversity of data enhancement results was improved by combining the deletion and addition operation in a certain ratio. The proposed method improved the Area Under receiver operating characteristic Curve (AUC) of the drug attribute prediction baseline model MG-BERT (Molecular Graph Bidirectional Encoder Representations from Transformer) by 1.94% to 12.49% on public datasets BACE, BBBP, ToxCast and ClinTox. Experimental results demonstrate the effectiveness of the proposed method on small sample drug data enhancement.

Table and Figures | Reference | Related Articles | Metrics